Goto

Collaborating Authors

 intelligent tutoring system


Evaluation of LLM-based Explanations for a Learning Analytics Dashboard

Deriyeva, Alina, Paassen, Benjamin

arXiv.org Artificial Intelligence

Learning Analytics Dashboards can be a powerful tool to support self-regulated learning in Digital Learning Environments and promote development of meta-cognitive skills, such as reflection. However, their effectiveness can be affected by the interpretability of the data they provide. To assist in the interpretation, we employ a large language model to generate verbal explanations of the data in the dashboard and evaluate it against a standalone dashboard and explanations provided by human teachers in an expert study with university level educators (N=12). We find that the LLM-based explanations of the skill state presented in the dashboard, as well as general recommendations on how to proceed with learning within the course are significantly more favored compared to the other conditions. This indicates that using LLMs for interpretation purposes can enhance the learning experience for learners while maintaining the pedagogical standards approved by teachers.


GraphMASAL: A Graph-based Multi-Agent System for Adaptive Learning

Zeng, Biqing, Liu, Mengquan, Zhen, Zongwei

arXiv.org Artificial Intelligence

The advent of Intelligent Tutoring Systems (ITSs) has marked a paradigm shift in education, enabling highly personalized learning pathways. However, true personalization requires adapting to learners' complex knowledge states (multi-source) and diverse goals (multi-sink); existing ITSs often lack the necessary structural-reasoning capability and knowledge dynamism to generate genuinely effective learning paths, and they lack scientifically rigorous validation paradigms. In this paper we propose GraphMASAL (A Graph-based Multi-Agent System for Adaptive Learning), which integrates (i) a dynamic knowledge graph for persistent, stateful learner modeling; (ii) a LangGraph-orchestrated trio of agents (Diagnostician, Planner, Tutor); (iii) a knowledge-graph-grounded two-stage neural IR component (dual-encoder dense retrieval with cross-encoder listwise re-ranking and calibrated score fusion); and (iv) a multi-source multi-sink (MSMS) planning engine with a cognitively grounded cost and an approximation guarantee via greedy set cover. Under blinded automated evaluations with matched inputs and inference settings across diverse student profiles, GraphMASAL consistently outperforms LLM prompting and structured ablations in planning--achieving stronger structural/sequence alignment of learning paths, higher coverage of weak concepts, and lower learning cost--while also surpassing prompt-based baselines in cognitive diagnosis. Agreement with expert/LLM-proxy ratings further supports the validity of our evaluation protocol. These findings indicate that grounding LLM agents in a dynamic knowledge graph, coupled with optimization under educational constraints, yields reliable, interpretable, and pedagogically plausible learning plans, advancing personalized and goal-oriented education.


Pedagogy-driven Evaluation of Generative AI-powered Intelligent Tutoring Systems

Maurya, Kaushal Kumar, Kochmar, Ekaterina

arXiv.org Artificial Intelligence

The interdisciplinary research domain of Artificial Intelligence in Education (AIED) has a long history of developing Intelligent Tutoring Systems (ITSs) by integrating insights from technological advancements, educational theories, and cognitive psychology. The remarkable success of generative AI (GenAI) models has accelerated the development of large language model (LLM)-powered ITSs, which have potential to imitate human-like, pedagogically rich, and cognitively demanding tutoring. However, the progress and impact of these systems remain largely untraceable due to the absence of reliable, universally accepted, and pedagogy-driven evaluation frameworks and benchmarks. Most existing educational dialogue-based ITS evaluations rely on subjective protocols and non-standardized benchmarks, leading to inconsistencies and limited generalizability. In this work, we take a step back from mainstream ITS development and provide comprehensive state-of-the-art evaluation practices, highlighting associated challenges through real-world case studies from careful and caring AIED research. Finally, building on insights from previous interdisciplinary AIED research, we propose three practical, feasible, and theoretically grounded research directions, rooted in learning science principles and aimed at establishing fair, unified, and scalable evaluation methodologies for ITSs.


Multi-Stakeholder Alignment in LLM-Powered Collaborative AI Systems: A Multi-Agent Framework for Intelligent Tutoring

Uchoa, Alexandre P, Oliveira, Carlo E T, Motta, Claudia L R, Schneider, Daniel

arXiv.org Artificial Intelligence

The integration of Large Language Models into Intelligent Tutoring Systems presents significant challenges in aligning with diverse and often conflicting values from students, parents, teachers, and institutions. Existing architectures lack formal mechanis ms for negotiating these multi - stakeholder tensions, creating risks in accountability and bias. This paper introduces the Advisory Governance Layer (AGL), a non - intrusive, multi - agent framework designed to enable distributed stakeholder participation in AI governance. The AGL employs specialized agents representing stakeholder groups to evaluate pedagogical actions against their specific policies in a privacy - preserving manner, anticipating future advances in personal assistant technology that will enhance stakeholder value expression. Through a novel policy taxonomy and conflict - resolution protocols, the framework provides structured, auditable governance advice to the ITS without altering its core pedagogical decision - making. This work contributes a refere nce architecture and technical specifications for aligning educational AI with multi - stakeholder values, bridging the gap between high - level ethical principles and practical implementation.


BacPrep: An Experimental Platform for Evaluating LLM-Based Bacalaureat Assessment

Marius, Dumitran Adrian, Radu, Dita

arXiv.org Artificial Intelligence

Accessing quality preparation and feedback for the Romanian Bacalaureat exam is challenging, particularly for students in remote or underserved areas. This paper introduces BacPrep, an experimental online platform exploring Large Language Model (LLM) potential for automated assessment, aiming to offer a free, accessible resource. Using official exam questions from the last 5 years, BacPrep employs one of Google's newest models, Gemini 2.0 Flash (released Feb 2025), guided by official grading schemes, to provide experimental feedback. Currently operational, its primary research function is collecting student solutions and LLM outputs. This focused dataset is vital for planned expert validation to rigorously evaluate the feasibility and accuracy of this cutting-edge LLM in the specific Bacalaureat context before reliable deployment.


A Comprehensive Review of AI-based Intelligent Tutoring Systems: Applications and Challenges

Zerkouk, Meriem, Mihoubi, Miloud, Chikhaoui, Belkacem

arXiv.org Artificial Intelligence

AI-based Intelligent Tutoring Systems (ITS) have significant potential to transform teaching and learning. As efforts continue to design, develop, and integrate ITS into educational contexts, mixed results about their effectiveness have emerged. This paper provides a comprehensive review to understand how ITS operate in real educational settings and to identify the associated challenges in their application and evaluation. We use a systematic literature review method to analyze numerous qualified studies published from 2010 to 2025, examining domains such as pedagogical strategies, NLP, adaptive learning, student modeling, and domain-specific applications of ITS. The results reveal a complex landscape regarding the effectiveness of ITS, highlighting both advancements and persistent challenges. The study also identifies a need for greater scientific rigor in experimental design and data analysis. Based on these findings, suggestions for future research and practical implications are proposed.


Enhancing tutoring systems by leveraging tailored promptings and domain knowledge with Large Language Models

Balavar, Mohsen, Yang, Wenli, Herbert, David, Yeom, Soonja

arXiv.org Artificial Intelligence

Recent advancements in artificial intelligence (AI) and machine learning have reignited interest in their impact on Computer-based Learning (CBL). AI-driven tools like ChatGPT and Intelligent Tutoring Systems (ITS) have enhanced learning experiences through personalisation and flexibility. ITSs can adapt to individual learning needs and provide customised feedback based on a student's performance, cognitive state, and learning path. Despite these advances, challenges remain in accommodating diverse learning styles and delivering real-time, context-aware feedback. Our research aims to address these gaps by integrating skill-aligned feedback via Retrieval Augmented Generation (RAG) into prompt engineering for Large Language Models (LLMs) and developing an application to enhance learning through personalised tutoring in a computer science programming context. The pilot study evaluated a proposed system using three quantitative metrics: readability score, response time, and feedback depth, across three programming tasks of varying complexity. The system successfully sorted simulated students into three skill-level categories and provided context-aware feedback. This targeted approach demonstrated better effectiveness and adaptability compared to general methods.


An overview of artificial intelligence in computer-assisted language learning

Katinskaia, Anisia

arXiv.org Artificial Intelligence

Computer-assisted language learning -- CALL -- is an established research field. We review how artificial intelligence can be applied to support language learning and teaching. The need for intelligent agents that assist language learners and teachers is increasing: the human teacher's time is a scarce and costly resource, which does not scale with growing demand. Further factors contribute to the need for CALL: pandemics and increasing demand for distance learning, migration of large populations, the need for sustainable and affordable support for learning, etc. CALL systems are made up of many components that perform various functions, and AI is applied to many different aspects in CALL, corresponding to their own expansive research areas. Most of what we find in the research literature and in practical use are prototypes or partial implementations -- systems that perform some aspects of the overall desired functionality. Complete solutions -- most of them commercial -- are few, because they require massive resources. Recent advances in AI should result in improvements in CALL, yet there is a lack of surveys that focus on AI in the context of this research field. This paper aims to present a perspective on the AI methods that can be employed for language learning from a position of a developer of a CALL system. We also aim to connect work from different disciplines, to build bridges for interdisciplinary work.


Investigating the Impact of Personalized AI Tutors on Language Learning Performance

Suh, Simon

arXiv.org Artificial Intelligence

Simon Suh Department of Technology and Society Stony Brook University [Abstract] Driven by the global shift towards online learning prompted by the COVID-19 pandemic, Artificial Intelligence (AI) has emerged as a pivotal player in the field of education. Intelligent Tutoring Systems (ITS) offer a new method of personalized teaching, replacing the limitations of traditional teaching methods. However, concerns arise about the ability of AI tutors to address skill development and engagement during the learning process. In this paper, I will conduct a quasi-experiment with paired-sample t-test on 34 students pre-and post-use of AI tutors in language learning platforms like Santa and Duolingo to examine the relationship between students' engagement, academic performance, and students' satisfaction during a personalized language learning experience. Keywords: Artificial Intelligence; Academic Performance; ITS Education; Student Engagement; Language Learning; Personalized Learning; Student Satisfaction 1. Introduction The educational landscape is undergoing a transformative shift with the integration of Artificial Intelligence (AI). Technologies like Intelligent Tutoring Systems (ITS), specifically designed to provide individualized instruction and feedback to learners (Sedlmeier, 2002), play a crucial role in this transformation, steering the educational landscape towards the Application of Artificial Intelligence in Education (AIEd) (Thomas et al., 2023). As the accessibility and diversity of AI technologies increase, this holds a significant potential to personalize learning experiences and unlock the educational potential of each student by fostering a more efficient and effective learning process (Rane, 2023).


LLM-powered Multi-agent Framework for Goal-oriented Learning in Intelligent Tutoring System

Wang, Tianfu, Zhan, Yi, Lian, Jianxun, Hu, Zhengyu, Yuan, Nicholas Jing, Zhang, Qi, Xie, Xing, Xiong, Hui

arXiv.org Artificial Intelligence

Intelligent Tutoring Systems (ITSs) have revolutionized education by offering personalized learning experiences. However, as goal-oriented learning, which emphasizes efficiently achieving specific objectives, becomes increasingly important in professional contexts, existing ITSs often struggle to deliver this type of targeted learning experience. In this paper, we propose GenMentor, an LLM-powered multi-agent framework designed to deliver goal-oriented, personalized learning within ITS. GenMentor begins by accurately mapping learners' goals to required skills using a fine-tuned LLM trained on a custom goal-to-skill dataset. After identifying the skill gap, it schedules an efficient learning path using an evolving optimization approach, driven by a comprehensive and dynamic profile of learners' multifaceted status. Additionally, GenMentor tailors learning content with an exploration-drafting-integration mechanism to align with individual learner needs. Extensive automated and human evaluations demonstrate GenMentor's effectiveness in learning guidance and content quality. Furthermore, we have deployed it in practice and also implemented it as an application. Practical human study with professional learners further highlights its effectiveness in goal alignment and resource targeting, leading to enhanced personalization. Supplementary resources are available at https://github.com/GeminiLight/gen-mentor.